Worst Case Efficient Single and Multiple String Matching in the RAM Model

نویسنده

  • Djamal Belazzougui
چکیده

In this paper, we explore worst-case solutions for the problems of single and multiple matching on strings in the word RAM model with word length w. In the first problem, we have to build a data structure based on a pattern p of length m over an alphabet of size σ such that we can answer to the following query: given a text T of length n, where each character is encoded using log σ bits return the positions of all the occurrences of p in T (in the following we refer by occ to the number of reported occurrences). For the multi-pattern matching problem we have a set S of d patterns of total length m and a query on a text T consists in finding all positions of all occurrences in T of the patterns in S. As each character of the text is encoded using log σ bits and we can read w bits in constant time in the RAM model, we assume that we can read up to Θ(w/ log σ) consecutive characters of the text in one time step. This implies that the fastest possible query time for both problems is O(n log σ w +occ). In this paper we present several different results for both problems which come close to that best possible query time. We first present two different linear space data structures for the first and second problem: the first one answers to single pattern matching queries in timeO(n( 1 m + log σ w )+occ) while the second one answers to multiple pattern matching queries to O(n( log d+log y+log logm y +

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Packed String Matching for Short Patterns

Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. In the last two decades a general trend has appeared trying to exploit the power of the word RAM model to speed-up the performances of classical string matching algorithms. In ...

متن کامل

Multiple Keyword Pattern Matching using Position Encoded Pattern Lattices

Formal concept analysis is used as the basis for two new multiple keyword string pattern matching algorithms. The algorithms addressed are built upon a so-called position encoded pattern lattice (PEPL). The algorithms presented are in conceptual form only; no experimental results are given. The first algorithm to be presented is easily understood and relies directly on the PEPL for matching. It...

متن کامل

Towards a Very Fast Multiple String Matching Algorithm for Short Patterns

Multiple exact string matching is one of the fundamental problems in computer science and finds applications in many other fields, among which computational biology and intrusion detection. It turns out that short patterns appear in many instances of such problems and, in most cases, sensibly affect the performances of the algorithms. Recent solutions in the field of string matching try to expl...

متن کامل

Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing

In this paper, we present a new data structure called the packed compact trie (packed c-trie) which stores a set S of k strings of total length n in n log σ+O(k log n) bits of space and supports fast pattern matching queries and updates, where σ is the size of an alphabet. Assume that α = log σ n letters are packed in a single machine word on the standard word RAM model, and let f(k, n) denote ...

متن کامل

Efficient string-matching allowing for non-overlapping inversions

Inversions are a class of chromosomal mutations, widely regarded as one of the major mechanisms for reorganizing the genome. In this paper we present a new algorithm for the approximate string matching problem allowing for non-overlapping inversions which runs in O(nm) worst-case time and O(m2) space, for a character sequence of size n and pattern of size m. This improves upon a previous O(nm2)...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010